Transparent runtime parallelization of the R scripting language
نویسندگان
چکیده
Scripting languages such as R and Matlab are widely used in scientific data processing. As the data volume and the complexity of analysis tasks both grow, sequential data processing using these tools often becomes the bottleneck in scientific workflows. We describe pR, a runtime framework for automatic and transparent parallelization of the popular R language used in statistical computing. Recognizing scripting languages’ interpreted nature and data analysis codes’ use pattern, we propose several novel techniques: (1) applying parallelizing compiler technology to runtime, whole-program dependence analysis of scripting languages, (2) incremental code analysis assisted with evaluation results, and (3) runtime parallelization of file accesses. Our framework does not require any modification to either the source code or the underlying R implementation. Experimental results demonstrate that pR can exploit both task and data parallelism transparently and overall has better performance as well as scalability compared to an existing parallel R package that requires code modification.
منابع مشابه
StarFlow: A Script-Centric Data Analysis Environment
We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions ena...
متن کاملOberon Script: A Lightweight Compiler and Runtime System for the Web
Oberon Script is a scripting language and runtime system for building interactive Web Client applications. It is based on the Oberon programming language and consists of a compiler that translates Oberon Script at load-time into JavaScript code, and a small runtime system that detects and compiles script sections written in Oberon Script.
متن کاملSafe Parallel Programming in an Interpreted Language
Parallel programming is increasingly important with the advent of multicore processors. However, modern software is difficult to parallelize because of the high degree of modularization. It is unclear whether a piece of code is parallel if it calls other functions. Dynamic languages such as Ruby, Python, and Matlab represent modularization to the extreme. A program, also known as a script, requ...
متن کاملProgramming Network Components Using NetPebbles: An Early Report
A network-centric application developer faces a number of challenges, including distributed program design, e cient remote object access, software reuse, and program deployment issues. This level of complexity hinders the developer's ability to focus on the application logic. NetPebbles removes this complexity from the developer through a network-component based scripting environment where remo...
متن کاملScripting For Java
Tcl has been initially developed as an embeddable command language to provide what we now call ”scripting” to complex applications. The ”scripting” or ”high level language” approach to provide control to applications from command lines, configurations files or ”macros” has been very successful and a major winning case for Tcl. In the last six years, Java appeared as a programming language and r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 71 شماره
صفحات -
تاریخ انتشار 2011